LIMIT THEOREMS FOR SOME ADAPTIVE MCMC ALGORITHMS 
WITH SUBGEOMETRIC KERNELS 
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^^ , Abstract. This paper deals with the ergodicity (convergence of the marginals) and 

fSJ , the law of large numbers for adaptive MCMC algorithms built from transition kernels 

^1^ that are not necessarily geometrically ergodic. We develop a number of results that 

broaden significantly the class of adaptive MCMC algorithms for which rigorous analysis 
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is now possible. As an example, we give a detailed analysis of the Adaptive Metropolis 
Algorithm of Haario et al. (2001) when the target distribution is sub-exponential in the 
tails. 



1. Introduction 

This paper deals with the convergence of Adaptive Markov Chain Monte Carlo (AM- 
CMC). Markov Chain Monte Carlo (MCMC) is a well known, widely used method to sam- 
ple from arbitrary probability distributions. One of the major limitation of the method is 
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On ■ the difficulty in finding sensible values for the parameters of the Markov kernels. Adap- 



tive MCMC provides a general framework to tackle this problem where the parameters 
^^ , are adaptively tuned, often using previously generated samples. This approach generates 

^D ■ a class of stochastic processes that is the object of this paper. 

Denote vr the probability measure of interest on some measure space (X, X). Let {Pg,0 S 
rS I 0} be a family of 0-irreducible and aperiodic Markov kernels each with invariant distribu- 

tion vr. We are interested in the class of stochastic processes based on non-homogeneous 
Markov chains {{Xn,0n), n > 0} with transition kernels {P {n;(x,9);{dx' ,d9')) ,n > 0} 
satisfying JqP {n;{x,6);{-,d9')) = Pg{x,-). Often, these transition kernels are of the 
form {Pg{x,dy)5H^^tgy\{d6'),n > 0} where {Hi, / > 0} is a family measurable func- 
tions, Hi : X X ^ 0. The stochastic approximation dynamic corresponds to the case 
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Hi(9,x) = + ji H{6,x). In this latter case, it is assumed that the best values for 6 are 
the solutions of the equation J H{6, x)'ir{dx) = 0. Since the pioneer work of Gilks et al. 
(1998); Holden (1998); Haario et al. (2001); Andrieu and Robert (2001), the number of 
AMCMC algorithms in the literature has significantly increased in recent years. But de- 
spite many recent works on the topic, the asymptotic behavior of these algorithms is still 
not completely understood. Almost all previous works on the convergence of AMCMC 
are limited to the case when each kernel Pg is geometrically ergodic (see e.g.. Roberts and 
Rosenthal (2007); Andrieu and Moulines (2006)). In this paper, we weaken this condition 
and consider the case when each transition kernel is sub-geometrically ergodic. 

More specifically, we study the ergodicity of the marginal {Xn,n > 0} i.e. the con- 
vergence to vr of the distribution of X^ irrespective of the initial distribution, and the 
existence of a strong law of large numbers for AMCMC. 

We first show that a diminishing adaptation assumption of the form |0„, — 0„,_i| — > 
in a sense to be made precise (assumption Bl) together with a uniform-in-0 positive 
recurrence towards a small set C (assumptions Al(i) and Al(iii)) and a uniform-in-0 
ergodicity condition of the kernels {Pg,0 € 0} (assumption Al(ii)) are enough to imply 
the ergodicity of AMCMC. 

We believe that this result is close to be optimal. Indeed, it is well documented in 
the literature that AMCMC can fail to be ergodic if the diminishing assumption does not 
hold (see e.g. Roberts and Rosenthal (2007) for examples). Furthermore, the additional 
assumptions are also fairly weak since in the case where is reduced to the single point 
{Oi,} so that {Xn,n > 0} is a Markov chain with transition kernel Pg^, these conditions 
hold if Pq_^ is an aperiodic positive that is polynomially ergodic. 

We then prove a strong law of large numbers for AMCMC. We show that the diminishing 
adaptation assumption and a uniform-in-^ polynomial drift condition towards a small set 
C of the form PgV <V — cV^"" + blc{x), a € (0, 1), implies a strong law of large number 
for all real-valued measurable functions / for which supx(|/|/l^^) < oo, /? G [0, 1 — a). 
This result is close to what can be achieved with Markov chains (with fixed transition 
kernel) under similar conditions (Meyn and Tweedie (1993)). 

On a more technical note, this paper makes two key contributions to the analysis of 
AMCMC. Firstly, to study the ergodicity, we use a more careful coupling technique which 
extends the coupling approach of Roberts and Rosenthal (2007). Secondly, we tackle the 
law of large numbers using a resolvent kernel approach together with martingales theory. 
This approach has a decisive advantage over the more classical Poisson equation approach 
(Andrieu and Moulines (2006)) in that no continuity property of the resolvent kernels is 
required. It is also worth noting that the results developed in this paper can be applied to 
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adaptive Markov chains beyond Markov Chain Monte Carlo simulation provided all the 
transition kernels have the same invariant distribution. 

The remainder of the paper is organized as follows. In Section 2 we state our assump- 
tions followed by a statement of our main results. Detailed discussion of the assumptions 
and some comparison with the literature are provided in Section 2.4. We apply our results 
to the analysis of the Adaptive Random Walk Metropolis algorithm of Haario et al. (2001) 
when the target distribution is sub-exponential in the tails. This is covered in Section 3 
together with a toy example taken from Atchade and Rosenthal (2005). All the proofs are 
postponed to Section 4. 

2. Statement of the results and discussion 

2.1. Notations. For a transition kernel P on a measurable general state space (T, B{T)), 
denote by P", n > 0, its n-th iterate defined as 

P\x,A)''^'6,{A), P^+H^,A) = lp{x,dy)P^{y,A), n>0; 

5x{dt) stands for the Dirac mass at {x}. P" is a transition kernel on {T,B{T)) that acts 
both on bounded measurable functions / on T and on cr-finite measures fi on (T, i3(T)) 
via P"/(-) = J P"(-, dy)f{y) and /uP"(-) = J fi{dx)P''{x, ■). 

If y : T ^ [1, +oo) is a function, the IZ-norm of a function / : T — > M is defined as 
\f\v = supj l/l/V^. When V = 1, this is the supremum norm. The set of functions with 
finite y-norm is denoted by Cy 

If /x is a signed measure on a measurable space (T,;B(T)), the total variation norm 
1 1 /i 1 1 TV is defined as 

II^IItv = sup |^(/)| = 2 sup \fi{A)\= sup fi{A) - inf ji{A) ; 
{/,|/|i<l} AeB(T) AeB(T) AeB(T) 

and the IZ-norm, for some function F : T ^ [1, +oo), is defined as \\n\\v = supr^ ui .<;^-i \fJ'{g)\- 

Let X, Q be two general state space resp. endowed with a countably generated cr-field 
X and B{Q). Let {Pq, 9 G G} be a family of Markov transition kernels on (X, X) such that 
for any (x, A) E X x Af , i— > Pg{x, A) is measurable. Let {P(n; •, •), n > 0} be a family of 
transition kernels on (X x 0, A" S(0)), satisfying for any A ^ X, 

P{n-{x,0y,{dx',d9'))=Pe{x,A). (1) 

Axe 

An adaptive Markov chain is a non-homogeneous Markov chain {Z„ = {Xn,On),Ti > 0} 

on X X with transition kernels {P{n; •; ■),n > 0}. 

Among examples of such transition kernels, consider the case when {(X„,0„),n > 0} 

is obtained through the algorithm: given (Xn,On), sample Xn+i ~ Pg^{Xn,-) and set 
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On+1 = On with probability 1 — pn+i or set 6n+i = S„_|_i(X„, 0„,X„_|_i) with probabihty 
Pn+i- Then 

P{n;{x,dy,{dx',de')) = Pe{x,dx') {(1 -p„+i) <5e((i0') +Pn+i %^^,(,,e,,,)(t?0} ■ 

A special case is the case when Pn+i = 1 and 6n+i = Hn+i{9n, Xn+i) ■, where {Hi, I > 0} 
is a family of measurable functions Hi : x X ^ 0. Then, 

P{n;{x,9y,idx',de')) = Pe{x,dx') 5H^_^,ie,.'){d9') . 

Such a situation occurs for example if 9n+i is updated following a stochastic approximation 
dynamic: 9n+i = 9n + -fn+iH{9n,Xn+i). 

Prom {P (n; •, •) , n > 0} and for any integer / > 0, we introduce a family - indexed by / 
- of sequence of transition kernels {Pi{n; •, •), n > 0}, where Pi (n; •, •) = P {I + n; •, •) and 
we denote by ¥Jq and E^ ^ the probability and expectation on the canonical space ($1, JF) 
of the canonical non-homogeneous Markov chain {Z„ = (X„,,^„),n > 0} with transition 
kernels {Pi{n] •; •), n > 0} and initial distribution Si^^Qy We denote by 9_ the shift operator 
on Q. and by {J-'kjk > 0} the natural filtration of the process {Zk,k > 0}. We use the 
notations P^i^g and E^.^^ as shorthand notations for P^ ^ and E^, q. 



Set 



D{9,9') = sup \\Pe{x, •) - Pe'{x, OUtv 



2.2. Convergence of the marginals. We assume that minorization, drift conditions and 
ergodicity are available for Pg uniformly in 9. For a set C, denote by tc the return-time 
to C X e : TC =^ inf{n > 1, X„ € C}. 

Al There exist a measurable function 1/ : X — > [1, -|-oo) and a measurable set C such 
that 
(i) sup; sup^^xe^l e [''('^c)] ^ ~'"°*^ ^°^' some non-decreasing function r : N — > 

(0, -l-cxo) such that Yin l/r("') < +oo. 
(ii) there exist a probability measure vr such that 

lim supy~"'^(2;) sup\\Pg{x,-)—Tr\\^y = 0. 
n^+oo ^gx 6»ee 

(iii) supg PgV <V on C^ and snp(2y^Q{PeV{x) + V{x)} < +oo. 

Bl There exist probability distributions ^1,1^2 resp. on X, such that for any e > 0, 
limn¥^^^^, {D{9n,9n-i) > e) = 0. 

Theorem 2.1. Assume Al and Bl. Then 

hm sup |Eg,,5,[/(X,)-^(/)]|=0. 

Sufficient conditions for Al to hold are the following uniform-in-0 conditions 
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A2 (i) The transition kernels Pq are (/>-irreducible, aperiodic. 

(ii) There exist a function V : \ ^> [l,+oo), a G (0,1) and constants 6, c such 
that for any S 

PeV{x) < V{x) - c V^^'^ix) + blc{x) . 

(iii) For any level set V oi V, there exist e© > and a probability z/d such that 
for any 9, Pe{x, •) > eDlx>(x) i^vi')- 
We thus have the corollary 

Corollary 2.2. (of Theorem 2.1) Assume A2 and Bl. Then 

hrn sup |E5„^,[/(X„)-^(/)]|=0. 

Assumption Al(i) and Al(iii) are designed to control the behavior of the chain "far 
from the center". When the state space X is "bounded" so that for example, y = 1 in 
Al(ii), then we have the following result 

Lemma 2.3. If there exists a probability measure vr such that lim„^_|_oo supxxe ll-Pg^l^^) ')" 
7'"(')IItv = 0, then Al(i) and Al(iii) hold with a bounded function V and C = X. 

Combining the assumptions of Lemma 2.3 and Bl, we deduce from Theorem 2.1 the 
convergence of the marginals. This result coincides with (Roberts and Rosenthal, 2007, 
Theorem 5). As observed by Bai (2008) (personal communication), assumption A2 also 
imply the "containment condition" as defined in Roberts and Rosenthal (2007). Con- 
sequently, Corollary 2.2 could also be established by applying (Roberts and Rosenthal, 
2007, Theorem 13): this would yield to the following statement, which is adapted from 
Bai (2008). Define M,(x, 0) '^= inf{n > 1, ||Pg"(x, •) - vr(-)||TV < e}- 

Proposition 2.4. Assume A2 and Bl. Then for any e > 0, the sequence {Me(A„, On),n > 
0} is bounded in probability for the probability IPgi^^j ^^'^ 

hrn sup \E^,^^,[f{Xn)-7T{f)]\=0. 



n—f+oo 



2.3. Strong law of large numbers. Assumptions Al and Bl are strengthened as follows 

A3 There exist a probability measure z^ on X, a positive constant e and a set C € ^ 

such that for any € ©, Pe{x, •) > Icix) ei^{-). 
A4 There exist a measurable function V : X ^ [1,-1-00), < a < 1 and positive 

constants b, c such that for any € 0, PqV <V — c V^~" + blc- 
A5 There exist a probability measure vr and some < /? < 1 — a such that for any 

level set V =^ {x G X, V{x) < d} of V, 

lim sup [[^^(a;, •) — nWyp = . 
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B2 For any level set T> oiV and any e > 0, 

limsup sup W^^Q {D{en,en-i) > e) = . 

" />0 X>x0 
Theorem 2.5. Assume A3- 5 and B2. Then for any measurable function f : X ^> M in 
Cyi3 and any initial distribution .^1,^2 resp. on X, such that ^i(V^) < +00, 

n 

lim n''Y.f{Xj,)=7T{f), IP6,6-«-^- 

fc=i 

As in the case of the convergence of the marginals, when A5 and B2 hold with V = X 
and /? = 0, A3 and A4 can be omitted. We thus have 

Proposition 2.6. Assume that A 5 and B2 hold with V = X and /3 = 0. Then for any 
measurable bounded function f : X ^ M. and any initial distribution ^1, ^2 resp. on X, 

n 

hm 7i-iV/(Xfc)=7r(/), %,6-«-5- 

k=\ 

2.4. Discussion. 

2.4.1. Non-adaptive case. We start by comparing our assumptions to assumptions in 
Markov chain theory under which the law of large numbers hold. In the setup above, 
taking = {0+} and H{9-),, x) = 0^, reduces {X„, n > 0} to a Markov chain with transition 
kernel Pq^. Assume that Po_^ is Harris-recurrent. 

In that case, a condition which is known to be minimal and to imply ergodicity in 
total variation norm is that Pq^ is an aperiodic positive Harris recurrent transition kernel 
(Meyn and Tweedie, 1993, Theorems 11.0.1 and 13.0.1). Condition Al(i) is stronger 
than positive Harris recurrence since it requires sup(^Ea.[r( re)] < +cx3 for some rate r, 
r(?i) >> n. Nevertheless, as discussed in the proof (see remark 2, Section 4), the condition 
^,j{l/r(n)} < +00 is really designed for the adaptive case. Al(ii) is stronger than what 
we want to prove (since Al(ii) implies the conclusion of Theorem 2.1 in the non- adaptive 
case); this is indeed due to our technique of proof which is based on the comparison of 
the adaptive process to a process - namely, a Markov chain with transition kernel Pq - 
whose stationary distribution is vr. Our proof is thus designed to address the adaptive 
case. Finally, Bl is trivially true. 

For the strong law of large numbers (Theorem 2.5), B2 is still trivially true in the Mar- 
kovian case and A5 is implied by A3 and A4 combined with the assumption that Pg^ is 
(^-irreducible and aperiodic (see Appendix A and references therein). In the Markovian 
case, whenever Pg^ is (^-irreducible and aperiodic, A3 and A4 are known sufficient condi- 
tions for a strong law of large numbers for / S Cyi-a, which is a bit stronger than the 
conclusions of Theorem 2.5. This slight loss of efficiency is due to the technique of proof 
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based on martingale theory (see comments Section 2.4.5). Observe that in the geometric 
case, there is the same loss of generality in (Andrieu and Moulines, 2006, Theorem 8). 
More generally, any proof of the law of large numbers based on the martingale theory 
(through for example the use of the Poisson's equation or of the resolvent kernel) will 
incur the same loss of efficiency since limit theorems exist only for L^-martingale with 
p> 1. 

2.4.2. Checking assumptions Al(ii) and A5. Al(ii) and A5 are the most technical of our 
assumptions. Contrary to the case of a single kernel, the relations between Al(ii) (resp. 
A5) and Al(i)-A3 (resp. A3, A4) are not completely well understood. Nevertheless these 
assumptions can be checked under conditions which are essentially of the form A3, A4 plus 
the assumptions that each transition kernel Pq is 0-irreducible and aperiodic, as discussed 
in Appendix A. 

2.4.3. On the uniformity in 9 in assumptions Al(i), Al(ii), A3 and A4- We have formu- 
lated Al(i), Al(ii), A3 and A4 such that all the constants involved are independent of 
6, for E 0. Intuitively, this corresponds to AMCMC algorithms based on kernels with 
overall similar ergodicity properties. This uniformity assumption might seem unrealisti- 
cally strong at first. But the next example shows that when these conditions do not hold 
uniformly in 6 for ^ G 0, pathologies can occur if the adaptation parameter can wander 
to the boundary of 0. 

Example 1. The example is adapted from Winkler (2003). Let X = {0, 1} and {Pe, G 
(0, 1)} be a family of transition matrices with Pg{0, 0) = Pe{^, 1) = 1 — ^. Let {On, n > 0}, 
On G (0, 1), be a deterministic sequence of real numbers decreasing to and {X„,,n > 0} 
be a non-homogeneous Markov chain on {0, 1} with transition matrices {P0^,n > 0}. One 
can check that D{On, On~i) < ^n-i — On for all n > 1 so that Bl and B2 hold. 

For any compact subset K of (0, 1), it can be checked that Al(i), Al(ii), A3 and A4 
hold uniformly for all ^ € K. But these assumptions do not hold uniformly for all G 
(0,1). Therefore Theorems 2.1 and 2.5 do not apply. Actually one can easily check 
that F^^og {Xn G •) ^ ^(O as n ^ oo, but that E^^e,, (^""^ Y2=i fi^k) - 7r(/)) do not 
converge to for bounded functions /. That is, the marginal distribution of Xn converges 
to TT but a weak law of large numbers fails to hold. 

This raises the question of how to construct AMCMC when Al(i), Al(ii), A3 and A4 do 
not hold uniformly for all G 0. When these assumptions hold uniformly on any compact 
subsets of and the adaptation is based on stochastic approximation, one approach is 
to stop the adaptation or to reproject On back on /C whenever On ^ /C for some fixed 
compact /C of 0. A more elaborate strategy is Chen's truncation method which - roughly 
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speaking - reinitializes the algorithm with a larger compact, whenever On ^ KL (Chen and 
Zhu (1986); Chen et al. (1988)). A third strategy consists in proving a drift condition 
on the bivariate process {(X„,0„,),n > 0} in order to ensure the stability of the process 
(Andrieu and Tadic (2008), see also Benveniste et al. (1987)). This question is however 
out of the scope of this paper; the use of the Chen's truncation method to weaken our 
assumption is addressed in Atchade and Fort (2008). 

2.4.4. Comparison with the literature. The convergence of AMCMC has been considered 
in a number of early works, most under a geometric ergodicity assumption. Haario et al. 
(2001) proved the convergence of the adaptive Random Walk Metropolis (ARWM) when 
the state space is bounded. Their results were generalized to unbounded spaces in Atchade 
and Rosenthal (2005) assuming the diminishing adaptation assumption and a geometric 
drift condition of the form 

PeV{x)<\V{x)+hlc{x), (2) 

for A G (0, 1), 6 < oo and 6* G G. 

Andrieu and Moulines (2006) undertook a thorough analysis of adaptive chains under 
the geometric drift condition (2) and proved a strong law of large numbers and a central 
limit theorem. Andrieu and Atchade (2007) gives a theoretical discussion on the efficiency 
of AMCMC under (2). 

Roberts and Rosenthal (2007) improves on the literature by relaxing the convergence 
rate assumption on the kernels. They prove the convergence of the marginal and a weak law 
of large numbers for bounded functions. But their analysis requires a uniform control on 
certain moments of the drift function, a condition which is easily checked in the geometric 
case (i.e. when A2 or A4 is replaced with (2)). Till recently, it was an open question in the 
polynomial case but this has been recently solved by Bai (2008) - contemporaneously with 
our work - who proves that such a control holds under conditions which are essentially of 
the form A2. 

Yang (2007) tackles some open questions mentioned in Roberts and Rosenthal (2007), 
by providing sufficient conditions - close to the conditions we give in Theorems 2.1 and 
2.5 - to ensure convergence of the marginals and a weak law of large numbers for bounded 
functions. The conditions in (Yang, 2007, Theorems 3.1 and 3.2) are stronger than our 
conditions. But we have noted some skips and mistakes in the proofs of these theorems. 

2.4.5. Comments on the methods of proof . The proof of Theorem 2.1 is based on an argu- 
ment extended from Roberts and Rosenthal (2007) which can be sketched heuristically as 
follows. For N large enough, we can expect Pg^Xn, •) to be within e to vr (by ergodicity). 
On the other hand, since the adaptation is diminishing, by waiting long enough, we can 
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find n such that the distribution of X„+Ar given (X„, 0.„) is within e to P^ {Xn, •). Com- 
bining these two arguments, we can then conclude that the distribution of X„+Ar is within 
2e to vr. This is essentially the argument of Roberts and Rosenthal (2007). The difficulty 
with this argument is that the distance between Pq {x, ■) and vr depends in general on x 
and can rarely be bounded uniformly in x. We solve this problem here by introducing 
some level set C oiV and by using two basic facts: (i) under Al(i), the process cannot wait 
too long before coming back in C; (ii) under Al(ii-iii), a bound on the distance between 
Pg{x, •) and vr uniformly in x, for x € C, is possible. 

The proof of Theorem 2.5 is based on a resolvent kernel approach that we adapted 
from Merlevede et al. (2006) (see also Maxwell and Woodroofe (2000)), combined with 
martingale theory. Another possible route to the SLLN is the Poisson's equation technique 
which has been used to study adaptive MCMC in Andrieu and Moulines (2006). Under 
A3 and A4, a solution go to the Poisson's equation with transition kernel Pq exists for any 
/ € -Cy/s, < /3 < 1 — a and go G Cy/s+a. But in order to use {go, ^ S 0} to obtain a 
SLLN for /, we typically need to control \go — go'\ which overall can be expensive. Here 
we avoid these pitfalls by introducing the resolvent ga{x,0) of the process {X„}, defined 
by 

#(^,^) =E(l-«)'"''^S[/(^^)] ' ^eX,0EG,aG(O,l),/>O. 



3. Examples 

3.1. A toy example. We first consider an example discussed in Atchade and Rosenthal 
(2005) (see also Roberts and Rosenthal (2007)). Let vr be a target density on the integers 
{1, • ■ ■ , K}, K > i. Let {Po, 6 e {!,■■■ , M}} be a family of Random Walk Metropolis 
algorithm with proposal distribution qo, the uniform distribution on {x — 9,- ■ ■ ,x — l,x + 

I,--- ,x + e}. 

Consider the sequence {(X„,0„),n > 0} defined as follows: given Xn,9n, 

• the conditional distribution of X„+i is Po^{Xn, •)• 

• if Xn+i = Xn, set 6n+i = max(l,^„ — 1) with probability Pn+i and 9n+i = Gn 
otherwise; if X„+i 7^ X^, set 6n+i = min(M, ^„ + 1) with probability Pn+i and 
9n+i = 9n otherwise. 

This algorithm defines a non- homogeneous Markov chain - still denoted {(X„,0„),n > 0} 
- on a canonical probability space endowed with a probability P. The transitions of this 
Markov process are given by the family of transition kernels {P{n; {x, 9), {dx' , d9'), n > 0} 
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where 

Pin-ix,9),{dx',de') = Pe{x,dx') {l.=^> {pn+i Jivie-i)!^^') + (1 " Pn+i) 6e{de')} 

+ lx^x' {pn+1 hiAie+i){dO') + (l-pn+i) Se{de')}) . 

In this example, each kernel Pq is uniformly ergodic : Pq is i5!)-irreducible, aperiodic, 
possesses an invariant probability measure tt and 

limsup||Pe"(x,-)-^(-)llTV=0. 
'^ xex 

Since O is finite, this implies that Al(ii) (resp. A5) hold with V = 1 (resp. D = X and 
P = 0). Furthermore, K^^^^g[D{9n,0n+i)] < 2p„,+i so that Bl (resp. B2) hold with any 
probability measures ^^i, ^2 (resp. with V = X) provided p„ ^ 0. By Lemma 2.3 combined 
with Theorem 2.1, and by Proposition 2.6, we have 

Proposition 3.1. Assume lim„p„ = 0. For any probability distributions ^1,1^2 on X, Q, 

(i) sup|^,|;|^<i| |E5,,5J/(X„)] - 7r(/)| ^ 
(a) For any bounded function f 



^Y,f{Xk)^<f), %,6-«-^- 



n 

k=l 

3.2. The adaptive Random Walk Metropolis of Haario et al. (2001). We illustrate 
our results with the adaptive Random Walk Metropolis of Haario et al. (2001). The 
Random Walk Metropolis (RWM) algorithm is a popular MCMC algorithm Hastings 
(1970); Metropolis et al. (1953). Let a target density vr, absolutely continuous w.r.t. the 
Lebesgue measure ^Leb with density still denoted by vr. Choose a proposal distribution with 
density w.r.t. fJ^Leb denoted q, and assume that g is a positive symmetric density on W. The 
algorithm generates a Markov chain {Xn.,n > 0} with invariant distribution vr as follows. 
Given Xn = x, a, new value Y = x + Z is proposed where Z is generated from g(-). Then 
we either 'accept' Y and set Xn+i = Y with probability a{x,Y) = min (1, 7r(y)/7r(x)) or 
we 'reject' Y and set Xn+i = x. 

For definiteness, we will assume that g' is a zero-mean multivariate Gaussian distribution 
(this assumption can be replaced by regularity conditions and moment conditions on the 
proposal distribution). Given a proposal distribution with finite second moments, the 
convergence rate of the RWM kernel depends mainly on the tail behavior of the target 
distribution vr. If vr is super-exponential in the tails with regular contours, then the RWM 
kernel is typically geometrically ergodic (Jarner and Hansen (2000)). Otherwise, it is 
typically sub-geometric (Fort and Mouhnes (2000, 2003); Douc et al. (2004)). 
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Define 

I1-, = j X -k{x) fiLeb{dx) , S^ = / xx'^ 7r{x)nLeb{dx) - f-l^ f-i^ , 

Jx Jx 

resp. the expectation and tlie covariance matrix of vr (--^ denotes tiie transpose opera- 
tion) . Theoretical results suggest setting the variance-covariance matrix S of the proposal 
distribution S = c^S^ where c* is set so as to reach the optimal acceptance rate a in 
stationarity (typically a is set to values around 0.3 — 0.4). See e.g. Roberts and Rosenthal 
(2001) for more details. Haario et al. (2001) have proposed an adaptive algorithm to learn 
S^, adaptively during the simulation. This algorithm has been studied in detail in An- 
drieu and Moulines (2006) under the assumption that vr is super-exponential in the tails. 
An adaptive algorithm to find the optimal value c^ has been proposed in Atchade and 
Rosenthal (2005) (see also Atchade (2006)) and studied under the assumption that vr is 
super-exponential in the tails. We extend these results to cases where vr is sub-exponential 
in the tails. 

Let 0-)_ be a convex compact of the cone of p x p symmetric positive definite matrices 
endowed with the Shur norm | • |s, \A\s = y^TT{A'^ A). For example, for a,M > 0, 
0_i_ = {A -|- aid: A is symmetric positive semidefinite and 1^1^ < M}. Next, for — cxo < 
Ki < Ku < oo and ©^ a compact subset of X, we introduce the space G = 0^ x 0_|. x [k; , Ky] ■ 
For 9 = {fi,T,,c) € 0, denote by Pq the transition kernel of the RWM algorithm with 
proposal qg where qq stands for the multivariate Gaussian distribution with variance- 
covariance matrix e'^S. 

Consider the adaptive RWM defined as follows 

Algorithm 3.1. Initialization: Let a be the target acceptance probability. Choose 

Xo G X, {fio,^o,co) € 0. 
Iteration: Given {Xn, fJ,n, S„, c„): 

1: Generate Zn+i ~ qe,^dfiLeb o,nd set Yn+i = Xn + Zn+i- With probabil- 
ity a{Xn,Yn+i) set X^+i = Yn+i o-nd with probability 1 — a{Xn,Yn-\-i) , set 

Xn+l = Xn- 

2: Set 

fl = fln + in+ 1)"-^ {Xn+l - /U„) , (3) 

T, = T^n + {n + ly^ {Xn+1 - Hn) {Xn+1 - fin) - ^n , (4) 

c = Cn-\ — - {a{Xn, y„+i) - a) . (5) 

n + 1 

3: // {n,T,,c) G 0, set fin+i = /U, Sn+i = 5] and Cn+i = c. Otherwise, set 

/J'n+l = jJ-n, ^n+1 = ^n and C„+i = Cn- 
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This is an algorithmic description of a random process {{Xn, On),n > 0} which is a non- 
homogeneous Markov chain with successive transitions kernels {P{n; (x, 9), (dx' , d9')),n > 
0} given by 

P{n; {x,9), {dx',d9')) = / qe{z) {a{x,x + z)Sx+zidx') + (1 - a{x,x + z))6x{dx')] ■ ■ ■ 
{'^{(l){e,x+z,x')ee}^(P{e,x+z,x'){d9') + t{(j,{e,x+z,x')(^e}^e{d9')) dfiiebidz) 

where (j) is the function defined from the rhs expressions of (3) to (5). Integrating over 9', 
we see that for any A € X, 

P{n;{x,9),{dx',d9')) = Pe{x,A) . 
Axe 

Lemma 3.2. Assume that vr is bounded from below and from, above on compact sets. Then 
any compact subset CofX with HLeb{C) > satisfies A3. 

Proof. See (Roberts and Tweedie, 1996, Theorem 2.2). D 

Following (Fort and Moulines (2000)), we assume that tt is sub-exponential in the tails: 

Dl vr is positive and continuous on M.P, and twice continuously differentiable in the 

tails. 
D2 there exist m € (0, 1), positive constants di < Di, i = 0, 1, 2 and r, R > such that 

for |rE| > R: 

^^) \|V7r(a:)|' \x\> - ' ' 

(ii) 41x1™ < -log^(2;) < Z?okr, 
(iii) (ii|x|™-i < |Vlog7r(x)| < L'i|xp-\ 
(iv) (i2|x|™-2 < |v2log7r(2;)| < D2\x\"^~'^. 

Examples of target density that satisfies D1-D2 are the Weibull distributions on M with 
density 7r(x) oc Ixj™""^ exp(— /^Ixl™) (for large \x\), /? > 0, m S (0, 1). Multidimensional 
examples are provided in Fort and Moulines (2000). 

3.2.1. Law of large numbers for exponential functions. In this subsection, we assume that 
D3 there exist s^ > 0, 0<t;<l — rri and < r/ < 1 such that as |x| — > -|-oo, 

sup / f 1 V -^^V* qe{z) flLebidz) = (|X|2(™-1)) . 

eee J{z,\z\>rj\x\^}\ tt[x + z)J \ J 

A sufficient condition for D3 is that 7r(x -|- z) > 7r(x)7r(z) for any x large enough and 
\z\ > T]\x\'" (which holds true for Weibull distributions with < tti < 1). Indeed, we then 
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have 



J{z,\z\>r,\x\--} V Vr(x + Z)y 



}{z)flLebidz) 



< C exp(-Av,?7^|xp")sup / exp(s^i:>o|^;r) exp(A^|2;p) q0{z)fiLeb{dz) 
eee J 

for some constant C < +00, and A* > such that the rhs is finite. 

Lemma 3.3. Assume Dl-3. For < s < s^, define Vs{x) = 1 + ii~^{x). There exist 

< s < s* and for any a € (0, 1), there exist positive constants b, c and a compact set C 

such that 

supPeVsix) < Vs{x) - cy/-"(x) + blcix). 
eee 

Hence A2-5 hold. 

Lemma 3.4. Assume Dl-3. B2 holds and Bl holds for any probability measures ^i,.^2 
such that J I Invrp/^ii^i < +cx3. 

The proof of Lemmas 3.3 and 3.4 are in Appendix C. 

Proposition 3.5. Assume Dl-3. Consider the sequence {Xn,n > 0} given by the algo- 
rithm 3.1. 

(i) For any probability measures ^1,^2 such that f \ Inyrp'^d.^i < +00, 

sup |E5,,5J/(X„)]-7r(/)|^0. 

(a) There exists < s < s^ such that for any probability measures ^1,^2 such that 
J {nl^^d^i < +00, and any function f € Ci^^^-r, < r < s, 



-i^/(Xfc)^7r(/), F^,,^, 



The drift function Vg exhibited in Lemma 3.3. is designed for Umit theorems relative to 
functions / increasing as exp{P\x\"^). This impUes a condition on the initial distribution 
^1 which has to possess sub-exponential moments (see Proposition 3.5(ii)), which always 
holds with S^i = 6x, X (zX. 

3.2.2. Law of large numbers for polynomially increasing functions. Proposition 3.5 also 
addresses the case when / is of the form 1 + \x\^' , r > 0. Nevertheless, the conditions on 
^1 and the assumptions D3 can be weakened in that case. 

We have to find a drift function V such that V^~°^{x) ~ 1 + Ixl^"*"^ for some a € (0, 1), 
t > 0. Under D3, this can be obtained from the proof of Lemma 3.3. and this yields 
V{x) ~ 1 + |x|''~^^~^^"™ (apply the Jensen's inequality to the drift inequality (24) with the 
concave function (f){t) ~ [Int]^''"'"'^"'"^)''""^; see (Jarner and Roberts, 2002, Lemma 3.5) for 
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similar calculations). Hence, the condition on ,^1 gets into ^i{\x\^'^'''^'^~'^) < +00 for some 
t >0. 

Drift inequalities with V ~ (— Invr)'* for some s > 2/m — 1, can also be derived by 
direct computations: in that case, D3 can be removed. Details are omitted and left to the 
interested reader. 

To conclude, observe that these discussions relative to polynomially increasing functions 
can be extended to any function / which is a concave transformation of vr"'*. 



4. Proofs of the results of Section 2 

For a set C £ X, define the hitting-time on C x G of {Zn,n > 0} by ac = inf{n > 
0, Z„ G C X 6}. If 7r(|/|) < +(X), we set / = / - 7r(/). 



4.1. Preliminary results. We gather some useful preliminary results in this section. 
Section 4.1.1 gives an approximation of the marginal distribution of the adaptive chain by 
the distribution of a related Markov chain. In Section 4.1.2, we develop various bounds 
for modulated moments of the adaptive chain as consequences of the drift conditions. In 
Section 4.1.3 we bound the expected return times of the adaptive chain to level sets of the 
drift function V. The culminating result of this subsection is Theorem 4.10 which gives 
an explicit bound on the resolvent function ga [x,9). 



4.1.1. Optimal coupling. 

Lemma 4.1. For any integers I > O^N > 2, any measurable bounded function f on X^ 
and any {x,9) € X x 0, 



a"^' 



E^\[f{Xi,--- ,Xn)] - / Pe{x,dxi) T\Pe{xk-i,dxk)f{xi 



N 

k=2 

N~l j 

(0 



i=i i=i 

Proof. We can assume w.l.g. that |/|i < 1. Set Zk = {xk,tk). With the convention that 
nfc=a'^fc = 1 for a > 6 and upon noting that Jy^PQ{x., dx')h{x') = /x^e ^li^'i i^i ^)-> {dx' ,dO'))h{x') 
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for any bounded measurable function h : X ^ M, 
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A 



N-l 



[ y2PiiO;{x,9),dzi) t[Pi{k-l;zk-i,dzk)--- 

N 

{PiU;zj,dzj+i) - Pi{0;{xj,9),dzj+i)} JJ P,(0; (xfc_i, 6l),(izfc)/(xi, • • • ,xn) 



k=j+2 



N-i j _ 

< V / Pi{0;{x,e),dzi) TlPiik-l;zk-i,dzk)sup\\Pt^{x,-)-Pe{x,-)\\TY 
i=i J^' k=2 ^6X 



where we used that 



N 



TT Pi{0;{xk-i,0),dzk)f{xi,--- ,xn) 

is bounded by a function H(xi, • • • ,Xj-^-i) that does not depend upon tk,k < N and for 
any bounded function H on X-'"^^ 



Xx0 



{Pi{j;zj,dzj+i) - Pi{0; {xj,e),dzj+i)} S(xi,- • • ,Xj+i) 



{Pt^{xj,dxj+i) - P0{xj,dxj+i)]E{xi,--- ,Xj+i) <sup\\Pt^{x,-)-Pe{x,-)\\TY |H|i 
X xex 



Hence 

N-l 

A < E <e 



sup||Pfl,(x, •) -P0o(x,-)||tv 



xex 

Af-l 



(0 



— / > x,< 



V sup llPg, {x, •) - Pe,_i (a::, ■) IItv 
xex 



j=i 



j=i i=i 



n 



Lemma 4.2. Lei /x,i/ 6e two probability distributions. There exist a probability space 
(O, J^, P) and random variables X, Y on {Q, T) such that X ^ ^,Y ^ v and P(X = Y) = 

1 — ||;U - i^llxV- 

The proof can be found e.g. in (Roberts and Rosenthal, 2004, Proposition 3). As a 
consequence of Lemmas 4.1 and 4.2, we have 

Proposition 4.3. Letl > 0,A^ > 2 and set z = {x,9). There exists a process {{Xk,Xk),0 < 
k < N} defined on a probability space endowed with the probability P^, ^ such that 

N-l j 



p(0 

2,2 



(Xfc = Xfc, < A: < iv) > 1 - E E ^^'^ [^(^*' ^i-i)] ' 



j=i j=i 
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{Xq, • • • , Xj\^) has the X -marginal distribution ofFz restricted to the time-interval {0, • • • , N}, 
and {Xq, • • • , Xi\f) has the same distribution as a homogeneous Markov chain with transi- 
tion kernel Pq and initial distribution 5x- 

4.1.2. Modulated moments for the adaptive chain. Let F : X ^ [1,+cxd) be a measurable 
function and assume that there exist C € A', positive constants 6, c and < a < 1 such 
that for any G G, 

(6) 



PqV <V - cl/^-° + 61c • 



Lemma 4.4. Assume (6). There exists h such that for any < /3 < 1, G 0.' PgV" < 

Vf^ - PcV'^"'' + blc. 

Proof. See (Jarner and Roberts, 2002, Lemma 3.5). D 

Proposition 4.5. Assume (6). For any / > 0, (x,0) G X x 0, and any stopping-time t, 



cE 



(0 



T-l 



Y,{kac+lT '"^ 



,fc=o 



<V{x)+h^% 



V-l 



Y^{{k + l)ac + ir"-' lc{Xu 



,fc=0 



Proof. The proof can be adapted from (Douc et al., 2004, Proposition 2.1) and (Meyn and 
Tweedie, 1993, Proposition 11.3.2)and is omitted. D 

Proposition 4.6. Assume (6). 
(i) There exists b such that for any j > 0, < /? < 1, / > and (x, 0) G X x 



E 



(0 

x,9 



yf^iXj) <v'^{x) + bf. 



(a) Let < /3 < 1 and < a < 1. For any stopping-time t, 



E 



(0 



[l-aYVP{Xr)l^ 



<+oo 



+ E, 



xfi 



T-l 



^(1 - ay {a V^iX,) + I3c{l - a)VP-^{X,)} 

3=0 



< y^(x) + 6(1 - a)Ef^)^ 
(Hi) Let < /? < 1 — a and < a < 1. For any stopping-time r and any g G [1, +cx3], 



r-l 

Y,il - ay IciXj) 

j=0 



E' 



(0 
x,e 



X;(i - ayv^ix, 

j=o 



< a^/'i'^l - a)-^/i T/'3+"/'?(x) 1 + 6 E^'J 
(with the convention that 1/q = when q = +oo^ 



T-l 



Y,{i-ayic{x,) 

j=0 



(ac)' 



-1/9 
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Proof. The proof is done in the case / = 0. The general case is similar and omitted, (i) is a 
trivial consequence of Lemma 4.4. (ii) Let /? < 1. Set tn = tAN and y„ = (1— «)"^^(-'^n)- 
Then 

tn tjv 

Yr. = ^0 + E (^i - ^^-i) = ^0 + 5^(1 - ay~' ((1 - a)y^(X,) - V^iX,^,) 



i=i 



i=i 



TJV 






TJV 



Hence, 



TJV-I 

j=0 



Vf{x) + E(l - ay ^x,e (ViXj) - Vf{Xj_i)) 1,<,^ 



j>i 






(-/3c ^^''-"(X,^!) + felc(X,_i)) 1 



where we used Lemma 4.4 in the last inequality. This implies 



^x,e [Yrr,] + a E^,0 



j=0 



+ (1 - a)/3c E^,e 



j<TN 



TjV-1 

E (1 - ay ^^-'^(^,- 

TAT-l 

E(i-ayic(^,) 

i=o 



The results follows when N — > +oo. 

(iii) The previous case provides two upper bounds, namely for < /3 < 1 — q, 

r-l 



r-1 



a Ea; 
and 

(1-a) ((/3 + a)c) E,,e 



<y^(x)+5(l-a)Ea 






T-l 

E(i-ayy'^(x,o 

i=o 



<y'^+°(x) + 6E^,9 



T-l 

j=0 



l/ql-l/q 



We then use the property [c < ci A C2] ^=^ c < c^ C2 for any q € [1, +00]. 



D 



Proposition 4.7. Assume (6). Let {rn,n > 0} 6e a non-increasing positive sequence. 
There exists b such that for any I >0, (x, ^) S X x 0, < /? < 1 and n > 0, 



Pc E« 



Y,rk+iV^-''{Xk) 



k>n 



< r F^'-* 



^/''(^n) 



+ 6E 



(0 

x,6» 



E^fc+ilc(^fe 



k>n 



The proof is on the same lines as the proof of Proposition 4.6(ii) and is omitted. 
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4.1.3. Delayed successive visits to an accessible level set ofV. Let V ^ X and two positive 
integers ni,,N. Define on (f], JF, P^^) the sequence of N- valued random variables {T"',n > 
1} as 



n def 1 dcf n , , nr^+n^ 

t" = TV , T^ = t" + n^ + TT>o ej +''* 



^k + l'^g^k^^^^^^QT^+N ^ ^>1_ 



Proposition 4.8. Assume A3 and there exist 1/ : X ^ [1, +oo) and a constant h < +oo 
such that for any 6 G Q, PqV <V — 1 + blc- Let V G X. Let 7ii,,N be two non-negative 
integers. Then 

L fc=o 

and if sup^) V < +oo and i'{'D) > 0, there exists a (finite) constant C depending upon 
e, v{T>),sxvpxi K b, n^, N such that for any I > 0, {x,9) gX x @ and k > 0, 



< 1 



E 



(0 



<kC + V{x) . 



Proof. Since V > 1, Proposition 4.6(ii) applied with a = 0, (3 = a = l, c = l and t = tx> 
implies 

"r-p — 1 



Ai) 



(0 



K:e i^v] < V{x) + b E^:; 



fc=0 



By A3, we have Pg{x,V) > [eu{'D)] lc{x) for any {x,6) so that 



ei^iV) E% 



Ai) 



V-p — 1 

E ^ciXk) 

. k=0 



< E 



(0 



T-p — l 

. fc=0 



E' 



(0 



. k=0 



< 1 



Hence E^ ^ [r-p] < V{x) + 6[ez^('D)] ^. By the Markov property and Proposition 4.6(i) 



Ef ^e [t^] <n^ + V{x) + b[eu{V)]"' + E^^'^ 



x,0 






<n^ + 2 b[eiy{V)]~^ + V{x) + sup l^ + n^b . 

V 

The proof is by induction on k. Assume that E^. ^ [r'^1 < kC+V{x) with C > 2b[ei'{T))]~^-\- 
sup-p V + {N V ?T-,t)(l + b). Then using again the Markov property and Proposition 4.6(i), 
and upon noting that F^ ^{Z^k € P) = 1, 



E 



(I) 

x,9 



^fe+1 



<JV + 1E« 



+ 4'i 



t'^ + N 



<N + b[eHVr' + E% [t'\ + E% [V{X,.^^)] 
<N + b[eu{Vr' + Ei') [r'^l + E^'^ Ie^^^ [V{X^)] 



1 , ro(0 



<N + b[£i^{V)]~'+E)^ 



+ ( sup y + Af 6 

V 



D 
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4.1.4. Generalized Poisson equation. Assume (6). Let < a < 1, / > and < /? < 1 — q. 
For / G Cyf3 such that vr(|/|) < +oo, let us define the function 

Proposition 4.9. Assume (6). Let < /? < 1 — a and f S Cyii- For any {x,0) € X x 0, 
/ > and < a < 1, ga exists, and 



m 



1 



^(0, 



Ai) 



g^^>{x,e)-E':' gr'>{Xi,Oi 



^('+1) 



1-a^ 

Proof. By Proposition 4.6(i), |eJ^'^)J/(Xj)] | < \f\yp {V'^{x)+bf). Hence, gii\x,0) 
exists for any x,6,l. Furthermore, ga {Xi,9i) is P^ ^-integrable. By definition of ga 
and by the Markov property. 



E1-. [^r^^ (^i,^i)] = E(i-«)'^'<l Uix^+i)] = (i-«)-' E(i-«)'^'<^e im)] 






j>0 



i>i 



(l-a)-i (5i')(x,^) -(!-«)/»). 



D 



Theorem 4.10. Assume A3-5 and B2. Let < /? < 1 — a. For any e > 0, there exists an 
integer n > 2 such that for any < a < 1, / G Cy/3, I > 0, {x,6) E X x and q G [1, +oo], 



{\f\vp)~' 9^a\^,0) <46(l-(l-ar) 



n\-l 



n 



+ 



ai-i/g(l _ a)i/<? 



[acj 



-l/q 



(l + b[eu{V)]'^ + 2{l + bn^){l + b) supy^+"/'?') . 



By convention, l/q = when q = +oo. In particular, Hnia^o (l/ly/s) aga {x,0) 



^(0/ 



0. 



Remark 1. Before dwelhng into the proof of the theorem, we first make two important 
remarks. Firstly, a simplified restatement of Theorem 4.10 is the following. There exists 
a finite constant cq such that for any < a < 1/2, / € >Cy,a, / > 0, (x, ^) € X x and 

q G [l,+oo], 

g^\x,e)\ < co\f\y, a-' (i + ai/'?y^+"/''(x)) . (7) 

This follows by taking e = 1, say, and upon noting that n (1 — (1 — a)")~ < 2'^~^ /a. The 
second point is that if we take ai, 02 € (0, 1) we can write 



9'd!ix,e)-g':>{x, 



(l-ai)(l-a2) 



Eil-a^r'E% gg^'Hx,,0,) 



fc>0 



By (7) and Proposition 4.6 (iii), it holds 

9^J}ix,e)-g(!^^{x,e) <ci \f\y,3 |a2-ai|a2-ia-'+^/V^+"/«(rE), 



(8) 
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for some finite constant ci, for all < 01,02 < 1/2, / € >Cy/3, I > 0, {x,6) G X x Q and 

q£ [l,+oo]. 

Proof. Let e > 0. Let us consider the sequence of stopping times {T^,k > 0} defined in 
Section 4.1.3 where [V^N^n^) are defined below. 

Choice of T>,N,ni,. Choose a level set V oiV large enough so that y{T>) > 0. Choose 
N such that 



N-l 



— ^ sup ||Pg^(x, •) -vr(-)||y/3 < e, 



the existence of which is given by A5; and such that - since a + /? < 1, - 



(qc)-i iV-i ( sup y^+° + 67^3+" + 6[ei^(2^)]"^ ) < 



V 



e . 



(9) 



(10) 



Set CN = N''^{e (supjjV^ + bN-^^fj^^ f] }V{i-/3) (which can be assumed to be 



strictly lower than N since /? > 0). By B2, choose n* such that for any q > n^, I > 0, 

supple Pi',U^(^9.Vi) > ejv/2) < ejv/4. 

i)(0 /'^fc 



By Proposition 4.8, P^Ut" < +00) = 1 for any (x,6') G X x 9, / > 0, k > 0. 



(0 



r(r'+0 



< eN, 



Optimal coupling. With these definitions, supj>i supfc>iE^g E^ [D{6i,0i-i)] 
upon noting that P^g(n^ < r ) = 1 and D{9,9') < 2. We apply Proposition 4.3 and set 
£n =^ {Xk =Xk,0<k< N}. We have for any / > 0, A: > 1, (x, 6*) G X x 9, 

N-l j 



E 



(0 






< N^eN < 1 . 



(11) 



Observe that T>, N and n^ do not depend upon a, I, x, 6 and /. 

Proof of Theorem 4.10. Assume that for any < a < 1, / > 0, (x, 0) G X x 9 and 

k>2. 



E 



(0 



7V-1 
j=0 



We have 



#(^,^) = E(i-« 

j>0 



iJ+1 J fW 



< \f\yf3 3Ne (1 - a) 



n* + (fc-l)7V 



(12) 



<?. [/(^i)%<ri] +E<:^ [/(x,-)i..<,-<. 



fe+i 



fc>i 



On one hand, by Proposition 4.6(iii) applied with t = t-d and Proposition 4.8, 



E(l-ay+'Ea[/(^.)%<r«] 



E 



(0 

xfi 



E (1 - ay^'fix, 

j=0 



< \f\v^ <l 



r-D — 1 

i=o 



<l/l 



\//3 



y/3+a/g(^) (l + 6[ei/(P)]-l) 

ai-i/g (1 - a)i/9 



ac 



-l/g 
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Applied with r = r©, Propositions 4.6(i and (iii) and 4.8 yield 
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I /I 



-1 



i>o 



I /I 



-1 



E 



[i) 



E (i-a)^+v"(^,: 



< E 



(0 



E 



{t-d+I) 



<E' 



n*-l 



(0 
x,8 



E 



Ztt, 



j=0 



E(i-«y+v^(^,) 



i=o 



+<'« 



E 



(ri,+n^+«) 



■-D+"* 



TX)-1 



^(l_a)i+iy/3(X,.) 



j=0 



< 2 %#^^^M-^/^ supy/^w. . 



For A; > 1, 



j>0 



ii+1 F.« 



(1-a) 



< 



E 



(0 

'xfi 



-E 



(0 



-'=+Ar ■[g(r'=+Af+0 



r''+Af-l 

r-D — l 



J=-r 






By Proposition 4.6(i) and (ii) applied with r = r-p, Proposition 4.8 and Eq. (12), and 
upon noting that t^ > n^, + {k — 1)N PL^\-a.s. , 



E(i-« 



,j+l F.C) 



<U/(^.)1.* 



<i<T 



fe+i 



< l/lv, eJ), [(1 - a)"*^^'^-!)^ (3iVe + (1 - a)^{l/^+'^(X,.+;v) + b[ei.iVr'Kac)-' 



< \f\v0 (1 - a)"*+('=-i)^ 3iVe + (ac)-i sup E^^l F^+°(X^) + Re^^C^?)]-' 



r,VxB 



< \f\y,B (1 - a)"*+(^-i)^ TsA^e + (ac)-^ ("sup y^+° + 6A^^+" + 6[ez^(P)]'i')') 

<4e|/|^, (l-a)('=-i)^iV, 

where we used the definition of A^ (see Eq. (10)) and Proposition 4.6(i). This yields the 

desired result. 

Proof of Eq. (12) . By the strong Markov property and since r > n* + A^(A; — 1) P^ g-a.s. 



E 



(0 



N-l 



E(i-ar-^^'+V"K.+,) 



i=o 



< (1-a)' 



,+Nik-l)-^il) 



E 






N-l 



Ed 



\j+i 



j=0 



m) 
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Furthermore, by Proposition 4.3, 



E 






N-\ 



^{l-ay^'f{X, 



E 



(r'+l) 



j=0 

'n-1 



E^^S 



N-l 
j=0 



j=0 



+d. 



N-l 



5](l-ay+M/(X,)-/(X,)}l,c 



On one hand, we have 



s(0 



a.s., 



Z^k )-^T-fc 



7V-1 



Y,ii - ay^' fix,) 



j=0 



N-l 



<\f\vP yZil-ay^' sup \\P^(x,-)-7r{-)\\yp<\f\y,Ne 



(0 



by (9). On the other hand, P^ ^ — a.s.. 



E 






N-l 



Y,il - ay^' mx,) - f{X,)}le^^ 

j=0 



< \f\v^ ^Ct 



N-l 



< \f\v^ <X 



T"' ' T"^ 



Y,il-ay^'{V^{Xj) + V('{Xj)}lec^ 

j=0 



'N-l 



P- 



Y^{l-ay^' [VP{X,) + VP{X^)] 



J=o 



^(t +1) , p(. , 

z_i.,z_k y'^N. 



T"- ' T"' 



1-/3 



by using the Jensen's inequahty (/3 < 1). By the Minkowski inequahty, by Proposi- 
tion 4.6(i), and by iterating the drift inequality A4 

P 



E 






'N-l 



P- 



Y^(x-ay^^ {y^(x,) + y^(l,)} 



,i=o 



N-l 



<Y,{1- ay^^ Ei;i^ [V{X,)r + Ei;i^ V{X 



P^wir'+l) 



j=Q 

N-l 

<Y.{1- ay+' { sup_ (E^;; [V{X^)] y + [ sup P^,V{x, 



j=0 



V+1 ) c,,^ IwW 



Dxe 



N-l , sf3 / _ 7V-1 

<2 y(l-a)-'+Msupy + j6) <2N IsupV^ + bN~^y" j 

,=0 V 15 ; \v 



Finahy, 



E 



(0 



Z^k,Z^u^^N) 



< E 



r(0 



j,(t +0 f fC \ 

z^k.z^ky^N) 



1-/3 



< [N'eN 



2 \l-/3 



where we used (11) in the last inequality. To conclude the proof, use the definition of e^v- 

D 
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4.2. Proof of Theorem 2.1. Let e > 0. We prove that there exists n^ such that for any 
n > n„ sup{jjj|^<i} |E5^,^2 [f{Xn)] \ < e. 



4.2.1. Definition ofD, N, Q and n^. By Al(i), choose Q such that 



sup 



I {x,e)ecxe 
By Al(ii), choose N such that 



sup E^,\r(rc)] T. VHa ^ ' ' 



k>Q 



r{k) 



(13) 



sup y-i(a;)||P,^(x,.)-vr(.)||TV<7.. 
(x,6»)ecxe V 



By Bl, choose n+ such that for any n > n^,, 

Pg,,^, {DiOn, On^i) > e/{2{N + Q- ifQ)) < 



A{N + Q-iyQ 



(14) 



(15) 



4.2.2. Optimal coupling. We apply Proposition 4.3 with / = and N <— N + Q. Set 



dcf 



£n+q = {Xk = Xk, < k < N + Q}. It holds for any r > n*, 

N+Q~l j 



E^, 



IX.GC ^Zr,Zr {^N + q) 



AT+Q-l j 

^ E E%.6[^(^«+-^«+r-l)]<eQ-^ (16) 

where in the last inequality, we use that D{6, 9') < 2 and the definition of n^ (see Eq. (15)). 

4.2.3. Proof. Let n > N + Q + n^. We consider the partition given by the last exit from 
the set C before time n — N . We use the notation {Xn-.m ^ C} as a shorthand notation for 
V\k=n{-'^k ^ C}; with the convention that {Xm+i-.m ^ C} = 0. We write 

n-N 



fe=0 

Since / is bounded on X by |/|i, we have 

%,6 [KXn)lx,..^_^ic] < l/li %,6 {rc>n-N)< \f\, E^,,^, -^^ A 1 



n-N 
The rhs is upper bounded by |/|i e for n large enough. By definition of Q in (13), 

n-(Af+Q) n~{N+Q) 

E %,6 [/(^n)lx.ec lx.+,_,^c] < l/li E %&[lx.ecIPSA(Tc>n-7V-fc) 



fc=0 



A;=0 



< l/li sup sup E« [r(rc)] E 4:^ ^ l^li ' ' (1^) 



Z Cx0 



k>Q 



r(fc) 
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Let k e {n - {N + Q) + 1, ■ ■ ■ ,n- N}. By definition of A^ and n^ (see Eqs. (14) and (15)), 
upon noting that k > n — {N + Q) > n*, 



%,6 [fiXn)lx,ec lx,+,„_^^c] - l/li %,6 hx.ec ^f„z, {£n+q 



< E 



5i,6 



l^W 



IXfeGC IE^,,,Zfc [/(^n-fc)lXl^„_^r_,^cl£ 



iV+Qj 



< E 



'6,6 



F(fc) 



l^feEC IEz,,Zfc f{Xn~k)tx,.,„^,^^,^c'^£ 



N+Q 



< E 



6,6 



IXfeGC E^ z 






f{Xn-knx,_^_^^C +l/ll%6 



< E 



'6,6 



T(fe) 



lA-,.C IE^,U, l,^^_,_,^c^.:/(^n 



-TV-A-v 



+ l/li eQ"' 



k^n+q) 



Tip(fc) 



< l/li eQ-^E5„5, lx,ecIE^,U, l^^_^_,^c^(X„_;v-fc) + \f\i eQ 



< l/li eQ-i <^ sup PeVix) + sup F U |/|i eQ-^ , 
l(x,6»)ecxe c I 



where we used Al(iii) in the last inequality. Hence, 



n-N 

E 

k=n-{N+Q)+l 



%,6 [/(^n)lx,GC lx,+i.„.^^c] < 1 + sup PgV{x) + SUp F £ |/|i 

\ {x,e)€Cxe c J 



This concludes the proof. 

Remark 2. In the case the process is non-adaptive, we can assume w.l.g. that it pos- 
sesses an atom a; in that case, the hnes (17) can be modified so that the assumptions 
^„{l/r(n)} < +00 can be removed. In the case of an atomic chain, we can indeed apply 
the above computations with C replaced by a and write: 

n-iN+Q) n-{N+Q) 

fc=0 fc=0 

<I/Ii 5^1Pa(ra>A;) . 

k>Q 

The rhs is small for convenient Q, provided EQ,[r(rQ,)] < +oo with r(n) = n. Unfortu- 
nately, the adaptive chain {{Xn,9n),'n' > 0} does not possess an atom thus explaining the 
condition on r. 



4.3. Proof of Corollary 2.2. The condition Al(ii) is established in Appendix A. Let 
a level set P large enough such that i^iT)) > 0; then Proposition 4.8 implies that there 
exists a constant c < oo such that for any I > 0, E^ ^ [rp] < cV{x). This implies that for 



SUBGEOMETRIC ADAPTIVE MCMC 



25 



< r/ < 1 - a, 



E 



(0 



T-V 



J2{k + ly 



.k=0 



<E 



(0 



.fc=0 



< c'' E 



(0 
x,e 



TV 



Y,y'-%x^) 



.k=0 



r(0 



<C iV{x) + bE%[Tv]]<C'V{x), 



for some finite constants C,C' independent upon 9. Hence Al(i) holds with r(n) ~ n^^^. 
Finally, PgV <V - cT/^"" + blc implies PgV <V - c-fV^-"" + blv for any 7 G (0, 1) and 
the level set V =^ {x, F^"" < b[c{l - 7)]"^}. This yields Al(iii). 

4.4. Proof of Proposition 2.4. Under A2, there exists a constant C - that does not 
depend upon 9 - such that for any (x, 0) G X x 0, n > and k G [1, a~^], 

(see Appendix A). To apply (Roberts and Rosenthal, 2007, Theorem 13), we only have to 
prove that there exists k G [1, a~^] such that the sequence {y"(X„); n > 0} is bounded in 
probability, which is equivalent to prove that {V^{Xn);n > 0} is bounded in probability 
for some (and thus any) (3 G (0,1] . This is a consequence of Lemma 4.11 applied with 
W = V^ for some /3 G (0, 1] and r(n) = (n + 1)-'^"'"'' for some r] > (see the proof of 
Corollary 2.2 for similar computations). 

Lemma 4.11. Assume that there exist a set C and functions M^ : X — > (0, +00) and 
r : N — > (0, +00) such that r is non- decreasing, PgW <W on C^ and 



sup PgW < +CX3 

Cxe 



sup sup E^'g [r(rc)] < +00 , 
/ Cxe 



^{l/r(/c)} < +00 



For any probability distributions ^1,^2 resp. on X, {W{Xn),n > 0} is bounded in prob- 
ability for the probability IP^j.^j- 

Proof. Let e > 0. We prove that there exists M^,N^ such that for any M > M^ and 
n > N^, Px,e iW{Xn) > M) < e. Choose N^ such that for any n>N^ 



E 



-a, 6 



TC 



L n 



Al 



< e/3 , sup sup Ei^'), [r(rc)] J]{l/r(A:)} < e/3 



« Cx0 



k>n 



and choose M^ such that for any M > M^, A^^ sup^j^e ^e^ — e^/3. We write 

n-l 

P^,,^, (t^(A:„) > M) = J^Fa,6 (W^(^n) > M,Xfc G C,Xk+i:n i C)+P5,,5, (VF(A:„) > M, ATo:. ^ C) 

fc=0 

By the Markov inequality, for n > N^, 

P^,,^, (WiXn) > M,Xo:n ^ C) < P^,,^, (Ao:„ ^ C) < P^,,^, (tc > n) < Eg,, 



^c 



Al 



n 



<e/3. 
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Furthermore, for n > N^, 

n-Nc n-Nc 

Yl ^5l,6 iW{Xn) > M,Xk G C,Xk+l:n ^ C) < Y, %,6 i^k G C,Xk+l:n ^ C) 
k=0 k=0 

n-N, 

< E %.6 

fe=0 



lc(Xfe) sup sup F% (Xi,„_fe ^ C) 
/ cxe 



n-Af, 



< \J sup sup P^ g {tc > n — k) 



fc=o 



« Cxe 



^ E 7iTSupsupEji[r(rc)]<6/3. 



Finally, for n > N^ we write 



^ P,,e (^^(^n) >M,Xke C, Xk+l-.n i C) 
k=n-N^+l 



< Yl ^-.e lc{Xk)r^xlejW{Xn.k)>M,Xi.,n-k^C) 



k=n-Nt+l 

We have, for any k ^ {n — N^ + 1,- ■ ■ , n} and (x, 6*) G C x G 

where, in the last inequality, we used the drift inequality on W outside C. Hence, 

" AT 

Y ^x,e {W{Xn) > M, Xk G C, Xfc+i;„ ^ C) < -^ sup PeW{x) < e/3 . 



The proof is concluded. 



Cxe 



D 



4.5. Proof of Theorem 2.5. By using the function ga introduced in Section 4.1.4 and 
by Proposition 4.9, we write ¥x e — a.s. 



n 



n n 

'1 Y RXk) = n-i E ((1 - ^)-'9i'HXk,ek) - Eg ,^ [5(^^+1) {x,,e,)] 



k=l 



k=l 



n 



n 





k=i 




n 

+ n-\l-a)-^Y{^^^e 


5?)(Xfc,^,)l-^^-i" 


- (1 - a)E^,e 



fc=l 



'gi'+'HXk+i,ek+i)\J'k\} 



n 



n 



k=l 



+ n-\l - a)-^ [E^^e [gi^HXi,ei)\J'o\ -^x,e [5i"+'^(^n+i,^n+i)|^n] } 

n 



A:=l 
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We apply the above inequalities with a = an and consider the different terms in turn. 
We show that they tend ¥x^e — a.s. to zero when the deterministic sequence {an,n > 1} 
satisfies conditions which are verified e.g. with an = {n + 1)"'' for some ^ such that 

C > , 2C < 1 - (0.5 V f3{l - a)-^) , C < 1 - /3(1 - a)"^ • 

To prove that each term converges a.s. to zero, we use the following characterization 



Ve> 0, lim P sup \Xn,\ > e 



[{Xn,n >0}^0 



a.s. . 



Hereafter, we assume that \f\v0 = 1- In the following, c (and below, ci, C2) are constant 
the value of which may vary upon each appearance. 
Convergence of Term 1. Set p = {1 — a)/ (3. We prove that 

n 

n-\l-an)-^Y.{sa}i^k,ek)-^^-U2 [5i'j(Xfc,0fc)|^fc-i]} ^0,P5„5, 



fc=i 



a.s. 



provided the sequence {an,n > 0} is non increasing, lim„^oo n^^^^^'^'^''^' ^/an = 0, 
^^^-i[^max(i/p,i/2)~i/^^]P < +00 and Y.n K " On-iK^i [n'^^^^i/P'^/^)- Va„] < +00. 



def .(fc) 



i.k). 



Proof. Define L'n^fc = gi,J{Xk,9k)-^i^^,i^2 daj iXk,Ok)\J^k-i ; Sn,k = E,=i -^n,j, if A; < 



dcf ,-^fc 



n 



and Sn,k = YTj=i Dn,j + T.j=n+i ^j,j if fe > "■; and i?„ = Yl]=i ^nj - Dn-i,j. Then for 
each n, {(5'„^fc,^fc), fc > 1} is a martingale. For k > n and by Lemma B.l, there exists a 
universal constant C such that 



\j=l j=n+l 

k 

< ci \f\vp ^--(PAD-ia-P^E^,,^, [V{Xj)] < ci \f\y, k^^^^l^^^^a-^UV), (18) 

where we used (7) and Proposition 4.6(ii). It follows that for any n > 1, limjv-^oo ^~^'^(,i,i2 (I'S'n.Af |^) < 
ci limjv^oo (yN'^^^^^'^'^^i'^'^/aN^ = 0. Then by the martingale array extension of the 
Chow-Birnbaum-Marshall's inequality (Lemma B.2), 



2"^'5''IP5i,6 sup m-^l - am)"' 



'm>n 



Y.Dn,. 



i=i 



>5 



< ^ {k-^ -{k + 1)-^) E^,,^, [\Sn,un + Y. ^"'% 6 ti^'^ 

\k=n+l 



k=n 



Under the assumptions on the sequence {an,n > 0} and given the bound (18), the first 
term in the rhs tends to zero as n — ;• +00. To bound the second term, we first note that 
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{(Si=i Dn,j — Dn-ij,^k), A; > 1} is a martingale for each n. Therefore, by Lemma B.l 



and the definition of D 



n,] 



n-1 



%,6 [l^nn < C n--(P/2.i)-i ^E^,^, [\Dn, - Z)^„_i,,n 



i=i 



n-l 



< 2C n--(^Ai)-i j;%,e, [\g(£{X„ej)-gitMj,0,)\^ 



Then, using (8) (with q = oo) and the usual argument of bounding moments of V^{Xj), 
we get 

^li^;^, [\Rnr] < Cl \f\v, n-'^^^V^a/p) 1^^ _ „^^_^| ^-l^-2^^^^yy 



-IwVp 



Under the assumptions, X^^f^ ^£ f [l-^"-l^] < +'^^ ^'^'^ ^^^^ concludes the proof. 



Convergence of Term 2. We prove that 



D 



n-\l-anr%,,^Jg£{X,,ei)\J'o 



0, 



provided lim„ nan = +oo and lim„ a„ = 0. 

Proof. By Theorem 4.10 applied with q = +oo, it may be proved that there exist constants 
c, N such that 



E, 



■6,6 



g(£{Xi,ei)\To\ <ca-'UV)+c{l-{l-an)^) 'n 



Divided by n ^(1 — an), the rhs tends to zero as n ^ +oo. 



D 



Convergence of Term 3. We prove that 



n-\l - an)-%,,^, gZ-^^>iXn+i,0n+i)\:Fn ^0,P5,,5, -a.s. 



provided the sequence {n ^an^,n > 1} is non-increasing, lim„n^ ^'^ °' o^ = +00, 
I]n("'"n)~*'^~"'*^ < +00 and lim.„ a„ = 0. 
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Proof. There exist constants ci,C2,N such that for any n large enough (i.e. such that 
1 - a„ > 1/2) and p =^ (1 - a)p-^ > 1 



^-l 



E, 



'Ci,6 



sup m 



i(™+l) 



5a™ (^m+1 ) ^m+1 ) \^n 



>s 



E, 



'6,6 



ei,6 ( supm ^(l-Om) 

\m>n 

m>n 

< 22P-1 5"P J] m-P {^ E5„5, fl^^P(X™+i) 



E, 



1,6 



9aZ^ {^m.+l,dm+l)\^rr 



g(Z'^^\Xm+l,9m+l)\:Frr 



m>n 



N 



(1 - (1 - am) 



N^ 



where we used Theorem 4.10 with q = +oo. Furthermore by Propositions 4.6(i) and 4.7 
and the drift inequahty, 

Pg,,^, fsupm-i(l-a^)-i Ee„5J^t+^)(X^+i,e™+i)|.FJ > s) 

\m>n '- -' / 



< 



< 



2PC3 
2PC3 

<5P 



n 



m>n m>n 



m 



iV 



m 



m>n 



m>n 



(l-(l-a^)^) 

N 
(l-(l-a„)^) 



Under the stated conditions on {a„,n > 1}, the rhs tends to zero as n ^ +oo. D 



Convergence of Term 4 . We prove that 



ann~\l - a„)-i 5^E5,,g, [gil+^\Xk+^,ek+i)\J'k\ -^ ,Pg,,g, - a.s. 
it=i 

■J J r lA[{l-a-/3)/a] -1 ^ n ■ ■ ■ v^ lA[(l-a-/3)/a] _1 ^ , , 

provided jOn n ,n > 1} is non-mcreasmg, 2^„ an n < +00, and 

hm.„ a„ = 0. 

Proof. Choose q > 1 such that /3 + a/q < 1 — a. Fix e > 0. From Theorem 4.10, there 
exist constants C, N such that for any n > 1, / > 0, (x, ^) € X x ©, 



ff« (x, 6) <C a]l^~^ y^+°/^(x) + 4eiV(l - (1 - a^)^)-^ . 
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Hence for n large enough such that (1 — an) > 1/2 



fc=i 

r, 

< 8a„eiV(l - (1 - a„)^)-i + 2C a\l'^n-^ E%-6 [^^+"/''(^fc+i)|-Ffc 

fc=i 

n 

< 8a„eiV(l - (1 - a„)^)-i + 2C a^^n-^ J]] V'-''{Xk) + 2C a]l% , 

k=l 

where we used P+a/q < 1 — a and Proposition 4.6(i) in the last inequality. Since lini„ a„ = 
and lim„a„eA^(l — (1 — an)^)"-"^ = e, we only have to prove that an n~^ Ylk=i ^^~°(^fe) 
converges to zero P^^^^j-a.s. By the Kronecker Lemma (see e.g (Hall and Heyde, 1980, Sec- 
tion 2.6)), this amounts to prove that X^fc>x aiJ'^k~^ V^~°(Xj.) is finite a.s. This property 
holds upon noting that by Proposition 4.7 and Proposition 4.6(i) 



E, 



6, 6 



k>n 



<aiJ''n~'E^,,^,[V{Xn)] + Y,^kf' 



i/gr.-i 



k>n 



<al!,'n-^ {UV) + hn)+Y, 



a]/'k~\ 



k>n 



and the rhs tends to zero under the stated assumptions. 



D 



4.6. Proof of Proposition 2.6. We only give the sketch of the proof since the proof is 
very similar to that of Theorem 2.5. We start with proving a result similar to Theorem 4.10. 
Since T> = X, the sequence {t'', A; > 0} is deterministic and r^"*"^ = t^+N + 1. By adapting 
the proof of Theorem 4.10 (/ is bounded and P = X), we establish that for any e > 0, 
there exists an integer n > 2 such that for any < a < 1, any bounded function f, I > 0, 
{x,9) £Xxe 



(l/li)"' 



#(^, 



n\-l 



<n + e (l-(l-a)")-' n 



We then introduce the martingale decomposition as in the proof of Theorem 2.5 and follow 
the same lines (with any p > 1). 

Appendix A. Explicit control of convergence 

We provide sufficient conditions for the assumptions Al(ii) and A5. The technique relies 
on the explicit control of convergence of a transition kernel P on a general state space 
(T, ji3(T)) to its stationary distribution vr. 



Proposition A.l. Let P be a (j)-irreducible and aperiodic transition kernel on {T,B{T)). 
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(i) Assume that there exist a probability measure v on T, positive constants e, b, c, a 
measurable set C, a measurable function y : T ^ [1, +00) and < a < 1 such that 

P{x,-)>\c{x)ev{-) , PV <V-cV^~" + blc . (19) 

Then P possesses an invariant probability measure it and tt(V^~°') < +00. 
(a) Assume in addition that c infcc V^~°' > b, sup^j V < +00 and v{C) > 0. Then 
there exists a constant C depending upon supjj V , ^{C) and e, a, b, c such that for any 
Q < 13 <l - a and I < K < a~^{l - (i), 

{n + ly-^ ||P"(x, •) - 7r(-)||y,a < C y'^+^'^Cx). (20) 

Proof. The conditions (19) imply that V is unbounded off petite set and P is recurrent. It 
also iniphes that {V < +00} is full and absorbing: hence there exists a level set DofV large 
enough such that i'{'D) > 0. Following the same lines as in the proof of Proposition 4.8, 
we prove that sup-p E^- [r-p] < +00. The proof of (i) in concluded by (Meyn and Tweedie, 
1993, Theorems 8.4.3., 10.0.1). The proof of (ii) is given in e.g. Fort and Moulines (2003) 
(see also Andrieu and Fort (2005); Douc et al. (2007)). 

D 

When b < c, c inf^c y^~" > b. Otherwise, it is easy to deduce the conditions of (ii) 
from conditions of the form (i) . 

Corollary A. 2. Let P be a phi- irreducible and aperiodic transition kernel on {T,B{T)). 

Assume that there exist positive constants b, c, a measurable set C, an unbounded measur- 
able function V : T —>■ [1, +0x0) and < a < 1 such that PV <V — cV^~°' + 61c • Assume 
in addition that the level sets ofV are 1-small. Then there exist a level setT> ofV , positive 
constants ex>, cx> and a probability measure vx> such that 

P{x, ■) > l^(x) ev M-) , PV <V-cv V^-^ + b\v, 

and sup-p V < +00, vv(J^) > 0, and c© infpc y^~° > b. 

Proof For any < 7 < 1, P^ < V-j c V^-'^ + b Iv^ with V^ = {y^-" < b[c{l--f)]~^}. 
Hence, suppi V < +00; and for 7 close to 1, we have 7c infx)c V^^'^ > b. Finally, the drift 
condition (19) implies that the set {V < +0x0} is full and absorbing and thus the level sets 
{V < d} are accessible for any d large enough. D 

The 1-smallness assumption is usually done for convenience and is not restrictive. In 
the case the level sets are petite (and thus m-small for some m > 1), the explicit upper 
bounds get intricate and are never detailed in the literature (at least in the polynomial 
case). Nevertheless, it is a recognized fact that the bounds derived in the case ni = 1 can 
be extended to the case m > 1. 
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Appendix B. L^-martingales and the Chow-Birnbaum-Marshall's inequality 

We deal with martingales and martingale arrays in the paper using the following two 
results. 

Lemma B.l. Let {{Dk,Tk), I < k > 1} be a martingale difference sequence and Mn = 
Yl=iDk- For anyp>l, 



k=l 



(21) 



where C = (l8pq^/^Y, p-^ + 



,-1 



1. 



Proof. By Burkholder's inequality (Hall and Heyde (1980), Theorem 2.10) applied to the 
martingale {(M„,.7^„), n > 1}, we get 



E(|M„f) < CE 



p/2- 



Ei^^'i 



\fc=i 



where C = [iSpq^''^) , p ^ -\- q ""^ = 1. The proof follows by noting that 

E l^'^-l' ^ n--(PAi)-i ^ \Dkf . (22) 

\fc=l / k=\ 

To prove (22), note that if 1 < p < 2, the convexity inequality (o + 6)" < a" + 6" which 
hold true for all a, 6 > and < a < 1 implies that (E"=i \F>k?Y''^ < ELi \^k\^- ^^ 
p > 2, Holder's inequality gives {Ek=i \Dk\Y^^ < n^''^~^ (ELi l^fcl'')- ° 

Lemma B.2 can be found in Atchade (2009) and provides a generalization to the classical 
Chow-Birnbaum-Marshall's inequality. 

Lemma B.2. Let {Dn,i,J^n,i, 1 < ^ < n}, n > 1 he a martingale-difference array and 
{cn, n > 1} a non-increasing sequence of positive numbers. Assume that J^n,i = ^i for all 
i,n. Define 

k n k 

Sn,k = E-Dn,i, «/l <k<n and Sn,k = '^Dn,i + ^ Djj, k > n; 



i=l 



i=l 



j=n+l 



n-1 



Rn — 2_^ (F^TiJ — Dn-lj) . 

For n <m < N , p>l and A > 



2-PAPp ( max c„|M^,™,| > A < c^E (|5„.^r) + V (< - ^.^^ E (|5„ 

\n<m<N J ■'' — ' V -^ -^ ' / 



.|P^ 



-FE 



N 
L j=n+l 



(23) 
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Appendix C. Proofs of Section 3.2 

In the proofs, C will denote a generic finite constant whose actual value might change 
from one appearance to the next. The proofs below differ from earlier works (see e.g. 
Fort and Moulines (2000); Douc et al. (2004)) since q is not assumed to be compactly 
supported. 

C.l. Proof of Lemma 3.3. 

Lemma C.l. Assume Dl-2. For all x large enough and \z\ < r]\x\^ , t i-^ Vs{x + tz) is 
twice continuously differentiable on [0, 1] . There exist a constant C < +oo and a positive 
function e such that \iniM_^f^ e{x) = 0, such that for all x large enough, \z\ < r]\x\'" and 
s < s^, 

sup |VV,(x + iz)j <C7sT4(x)|x|2(™"i)(s + e(x)) . 

iG[0,l] 

Proof \x + z\ > \x\ — 77|x|" > (1 — r/)|x|" so that t i— > Vs{x + tz) is twice continuously 
differentiable on [0, 1] for |2;| large enough. We have 

\V^Vsix + tz)\ <sVsix) ^^(^ +^^) |vin7r(3; + tz)Vlmr{x + tzf\ ■ ■ ■ 

Vs{x) 

iV^lnvrfx + tz) 

s + 



I V In 7r(x + tz)V ln7r(x + tz)^| 
Under the stated assumptions, there exists a constant C such that for any x large enough 
and \z\ < r]\x\'" 

V^lmrix + tz)\ \ . . D2 ,__,-ynv 



S] V ^ |Vln7r(x + tz)Vln7r(x + tz)rjj " "^ + df(l - 77) '^' 
and 

sup |Vln7r(x + tz)Vln7r(x + te)^| < |3;|2('"-i)Z)2(l-r/|3;|'^-i)^^'""^^ .. 

tG[0,l] 

Finally, 

f-K{x + tz)\~^ ^ , , , ,r«_i fTrix + tz) 

sup ^~r^r^ <l + s*-Di|z| sup \x + tz\ sup ^"T^^ 

which yields the desired result upon noting that |2:||x + tz\^~^ < r]\x\^^''^~^ {1 — 7]\x\^~^) 
is arbitrarily small for x large enough. D 

We now turn to the proof of Lemma 3.3. For x G X, define R{x) := {y E X : 7r(y) < 
7r(x)} and R{x) — x = {y — x : y G R{x)}. We have: 



PeVsix) - Vs{x) = / {Vs{x + z)- Vs{x)) qe{z) fiLeb{dz) 



+ [ {V{x + z)- V{x)) fl^t±^ - 1 ) q,(z) flLebidz) 

Jr{x)-x V ^(^) 
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If X remains in a compact set C, using D2(ii) and the continuity of x i-^ Vs{x), we have 
ys(x + z) < C(l + exp(si:>okr'"))- It fohows that 

supsup{P6»^s(a;) - Vs{x)} < C sup / (1 + exp(sDokr)) qe{z) l-i^Lebidz) < +cx) . 

eeexGC 9&eJR{x)-x 

More generaUy, let x large enough. Define l{x) = log7r(x), Ry{x^z) = Vs{x + z) — 

Vs{x) + sVs{x){z,Vl{x)), Rn{x,z) =^ it{x + z){7r{x)y^ -1- {z,S/l{x)). Using the fact that 

the mean of qe is zero, we can write: PgVs{x) — Vs{x) = Ii{x, 9, s) + hix, 0, s) + Isix, 9, s) 

where 

Ii{x,9,s) = -sVs{x) / {z,\/l{x)f qe{z) ULebidz) , 

Jr{x)-x 

h{x, 9,s) = / Rv{x, z) qe{z) HLeb{dz)+ / Rv{x, z) 1 qg{z) fiiebidz) , 

J Jr{x)-x V T^yx) J 

and 

l3{x,9,s) = -sVs{x) / R^{x,z){z,\7l{x)) qe{z) fiLeb{dz) . 

Jr(x)-x 

C.1.1. First term. It follows from (Fort and Moulines, 2000, Lemma B.3. and proof of 

Proposition 3) that, under D2(i), there exists 6 > 0, such that for all G 0, 

{z,Vl{x)f qe{z) fiLebidz) > b |V/(x)|2 . 



' R(x)—x 

Hence, supg^Q Ii{x,9,s) < -s Vs{x) b dl\x\'^^"^~^\ 

C.1.2. Second term. For z E R{x) — x, 7r(x + z) < 7r(a;). Therefore \l2{x,6,s)\ < 
2 J \Ry{x, z)\qg{z) ij,Leb{dz). By Lemma C.l, there exists C < +cx) - independent of s 
for s < Si,- such that for any \z\ < ??|a;|", 

\Rv{x,z)\ <C sVs{x) Ixp^"^-!) \zf {s + e{x)) . 

This implies that there exists a constant C < +oo - independent of s for s < s* - such 
that 

\Rv{x, z)\qe{z) iiLeb{dz) <C s Vs{x) Ixp^'"-^) (s + e{x)) / \z\^q0{z)^XLeb{dz) 



+ Vs{x) I \ qe{z)nLebidz) 

J{z,\z\>-n\x\^} ^s{X) 

+ C Vs{x) \xr-^ I \z\ qe{z)f,Leb{dz) . 

J{z,\z\>r,\x\"} 

There exists a constant C such that for E G and s < s*, the first term in the rhs is upper 
bounded by C s Vs{x) jxp™"^) (s + £(2;)). Under D3, the second term is upper bounded 
by Vs{x) |a;p"^~^^ e{x) with limui^^oo e(x) = uniformly in 9 for E ©, and in s for 
s < Si,. Since qg is a multivariate Gaussian distribution, there exists A* > such that 
supggQ f exp{Xi,\z\'^)qg{z)fiL(,b{dz) < +00. Under D3, the third term is upper bounded by 
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C Vs{x) Ixp™"-^' exp(— Ar/^lxp") for some A G (0, A^), uniformly in 6 for 9 G Q, and in s 
for s < Si,. Hence, we proved that there exists C^ < oo such that for any s < s^,, 

sup|/2(2;,^,s)|<ay,(x)|xp(— 1) {s^ + eix)) , 
eee 

for a positive function e independent of s and such that hmi^i^,!.;^ e{x) = 0. 

C.1.3. Third term. Following the same lines as in the control of l2{x,9,s), it may be 
proved that 

l3{x,9,s)<sVs{x)Di\xr~^ I \z\{l + Di\z\\xr~^)qe{z)f,Leb{dz) 

J {z,\z\>ri\x\^} 

+ C y,(x)|x|3(— 1) / \zf qgiz)fiLeb{dz) < C ^.(x) |x|2(— i)e(x) 

J {z,\z\<ri\x\--} 

for a positive function e independent of s, 9 and such that limu|^^ooe(x) = 0. 

C.1.4. Conclusion. Let a S (0, 1). By combining the above calculations, we prove that by 
choosing s small enough such that c* = ftdf — d^s > 0, we have 

supPeV;(x) < Vs{x) - c^V;(x)|x|2('"-i) + b.ldx) (24) 

See 

< Vs{x) - 0.5c,Vf{x) + Klcix) (25) 

for a compact set C. This proves A2(ii) and A4. A5 follows from the results of Appendix A. 
A2(iii) and A3 follow from Lemma 3.2. 

C.2. Proof of Lemma 3.4. An easy modification in the proof of (Andrieu and Moulines, 
2006, Proposition 11) (to adjust for the difference in the drift function) shows that 
D{9, 9') <2jy^ \qe':T,{x) — q^c' -^i {x)\fiLeb{dx) . We then apply (Andrieu and Moulines, 2006, 
Lemma 12) to obtain that D{9,9') < C \e'^T, — e^ S'|s where C is a finite constant de- 
pending upon the compact Q. Hereafter, C is finite and its value may change upon each 
appearance. For any /, n > 0, e > 0, x G M^ and 9 £ Q, we have 

F^^^g {D{9n, 9n+i) > e) < e^iE^'J, [D{9n, ^n+l)] 

— ^"^xfi [kn+l — C„| + |i;„+l — Sn|s] 

< C{l + n + l)-^{l+ ^% [|X„+in + ^E« [|X„+i|2]) . 

D2(ii) implies that we can find C < oo such that |xp < C (/)(V^(x)) for all x G X where 
4>{t) = [Int]^'™. From the drift condition (Lemma 3.3), Proposition 4.6(i) and the concav- 
ity of (j), we deduce that there exists C such that eJ^'J, [|X„|2] < C [lny,(x)]2/™ [Inn]^/™. 
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We conclude that for any probability ,^1 such that ■?i([ln Kj]^' ™) < +cx3, lim.„ IPg^^^a (-^(^m ^n+i) > e) 
and for any level set V oi Vs, 

lim sup sup F'il {D{en,en+i) > e) = . 
"^°° i>o vxe ' 
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